Add more property-caching optimizations to x509 Rust backend by abbra · Pull Request #14441 · pyca/cryptography

abbra · 2026-03-08T11:12:42Z

PyCA x509 Rust backend — caching optimizations

I was working on my general purpose ASN.1 library and when creating Python bindings, I tested against PyCA code. Some operations in PyCA were slow compared to my code so I wanted to look into what could be improved. With the help of Claude Code I've got some repeatable patterns improved using the same approach PyCA already had in place for some properties.

Below is a report Claude created.

Background

The x509 Rust backend (src/rust/src/x509/) converts parsed ASN.1 data into Python objects on every property access. Operations like name parsing (parse_name), public-key loading, OID conversion, and serial-number iteration are not cheap: each one allocates Python objects, traverses ASN.1 sequences, and crosses the Rust/Python FFI boundary. In workloads that touch the same property more than once on the same object (chain building, path validation, CRL checking, OCSP processing) this cost is paid repeatedly and unnecessarily.

The existing mitigation is pyo3::sync::PyOnceLock<pyo3::Py<pyo3::PyAny>>: a thread-safe write-once cell that stores the Python object after the first computation. It was already used for extension lists everywhere. The work described here extends that pattern to the remaining uncached properties.

Caching pattern

Every cached getter follows the same idiom:

// struct field
cached_foo: pyo3::sync::PyOnceLock<pyo3::Py<pyo3::PyAny>>,

// getter
fn foo<'p>(&self, py: Python<'p>) -> PyResult<Bound<'p, PyAny>> {
    Ok(self.cached_foo
        .get_or_try_init(py, || expensive_computation(py).map(|v| v.unbind()))?
        .bind(py)
        .clone())
}

get_or_try_init is a no-op on every call after the first; the atomic check costs ~50 ns, the cached result is returned without any allocation.

What was implemented

Ten changes were made across five files, committed individually on the performance-improvements branch.

Commit	File	Change
`e7cc638`	`csr.rs`	Cache `CertificateSigningRequest.attributes`
`75451dc`	`ocsp_req.rs`	Cache `OCSPRequest` `issuer_name_hash`, `issuer_key_hash`, `hash_algorithm`, `serial_number`
`1830ca3`	`certificate.rs`, `pkcs7.rs`, `ocsp_resp.rs`	Cache `Certificate` `issuer`, `subject`, `public_key`, `signature_algorithm_oid`, `signature_hash_algorithm`
`e90f4e5`	`ocsp_resp.rs`	Fix O(n²) `certificates` iteration; cache the resulting list
`986298b`	`ocsp_resp.rs`	Add `OCSPSingleResponse.extensions` getter with caching
`f18d144`	`crl.rs`	Cache `CRL` `issuer`, `signature_algorithm_oid`, `signature_hash_algorithm`
`d149aaf`	`crl.rs`	Replace `get_revoked_certificate_by_serial_number` linear scan with O(1) `HashMap`

The OCSPResponse.certificates getter additionally had a documented O(n²) bug (each certificate extracted via clone().nth(i) restarted the iterator). It was replaced with a single linear pass using asn1::write_single to produce independent DER bytes for each certificate, eliminating the need for the map_arc_data_ocsp_response unsafe helper.

Benchmark results

Benchmarks measure repeated access on a single pre-loaded object — the workload that caching is designed to accelerate. Each benchmark creates the object once outside the timed loop, then calls the getter in a tight loop.

Comparison: main (baseline) vs abbra-p-f (PR), both built with maturin develop --release, Python 3.14, OpenSSL 3.5.

Benchmark	Baseline (median)	PR (median)	Speedup
`certificate_subject`	7141 ns	56 ns	99% faster
`certificate_issuer`	5626 ns	56 ns	99% faster
`crl_issuer`	5460 ns	56 ns	99% faster
`certificate_public_key`	1361 ns	55 ns	96% faster
`ocsp_request_properties`	1525 ns	119 ns	92% faster
`crl_serial_number_lookup_miss`	2159 ns	224 ns	90% faster
`certificate_signature_hash_algorithm`	190 ns	56 ns	71% faster
`crl_serial_number_lookup_hit`	448 ns	295 ns	34% faster
`certificate_signature_algorithm_oid`	109 ns	56 ns	49% faster
`ocsp_response_properties`	1190 ns	1150 ns	~3% (noise)

The subject/issuer/CRL-issuer gains are ~100× because parse_name is the most expensive operation — it constructs a full Python Name object tree from ASN.1 on every call. The cached path costs only an atomic load plus a Python reference clone (~50 ns regardless of name complexity).

crl_serial_number_lookup_hit is 34% faster rather than near-zero because get_revoked_certificate_by_serial_number must still construct a new RevokedCertificate Python object on each hit (the HashMap stores OwnedRevokedCertificate values that are cloned per call). The miss path (90% faster) avoids iterating the whole list and drops from O(n) to O(1).

ocsp_response_properties shows no meaningful change because the properties benchmarked there (issuer_key_hash, serial_number, signature_hash_algorithm on the response-level object) were already relatively cheap and the test exercises only a few iterations of the warm path.

Why the existing load benchmarks showed no improvement

test_load_der_certificate and test_load_pem_certificate each call x509.load_der_x509_certificate(bytes) per iteration, creating a fresh object with empty caches each time. The cache is always cold; caching adds zero benefit and a tiny overhead (extra PyOnceLock::new() fields). These benchmarks measure parsing throughput, not property-access throughput, so they are unaffected by this work.

Benchmark reproduction

uv venv /tmp/bench-venv --python python3.14
uv pip install --python /tmp/bench-venv/bin/python \
    maturin pytest pytest-benchmark certifi setuptools cffi
uv pip install --python /tmp/bench-venv/bin/python -e vectors/

# baseline (main branch)
git checkout main
cp tests/bench/test_x509.py /tmp/bench_test.py   # copy new benchmarks over
VIRTUAL_ENV=/tmp/bench-venv maturin develop --release
/tmp/bench-venv/bin/python -m pytest tests/bench/test_x509.py \
    -k "subject or issuer or public_key or signature or crl_serial or ocsp" \
    --benchmark-json=/tmp/bench_base.json --benchmark-enable \
    --benchmark-warmup=on --benchmark-min-rounds=200 -q

# PR branch
git checkout abbra-p-f
VIRTUAL_ENV=/tmp/bench-venv maturin develop --release
/tmp/bench-venv/bin/python -m pytest tests/bench/test_x509.py \
    -k "subject or issuer or public_key or signature or crl_serial or ocsp" \
    --benchmark-json=/tmp/bench_pr.json --benchmark-enable \
    --benchmark-warmup=on --benchmark-min-rounds=200 -q

python3 .github/bin/compare_benchmarks.py /tmp/bench_base.json /tmp/bench_pr.json

The existing load benchmarks create a fresh object each iteration, so the cache is always cold and caching optimisations show no benefit there. Add benchmarks that construct the object once and then repeatedly call the getter, exercising the warm-cache path: Certificate : subject, issuer, public_key(), signature_hash_algorithm, signature_algorithm_oid CRL : issuer, serial-number lookup (hit and miss) OCSPRequest : issuer_name_hash, issuer_key_hash, hash_algorithm, serial_number (all in one bench) OCSPResponse: issuer_key_hash, serial_number, signature_hash_algorithm (all in one bench) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Alexander Bokovoy <abokovoy@redhat.com>

…ect caching The test assumed cert.subject re-parses the Name on every call, so it checked each too-long-country warning in its own pytest.warns block. After subject caching, parse_name runs only once (on the first access) and emits both COUNTRY_NAME and JURISDICTION_COUNTRY_NAME warnings in a single call. Subsequent accesses return the cached Name object without re-parsing, so the second block saw no warnings. Merge both assertions into a single pytest.warns block, which correctly captures all warnings emitted during the first (and only) parse. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Alexander Bokovoy <abokovoy@redhat.com>

Wrap the attributes getter in PyOnceLock so the expensive loop over ASN.1 attributes (OID conversion, PyBytes allocation, Attributes construction) runs at most once per CertificateSigningRequest object. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Alexander Bokovoy <abokovoy@redhat.com>

Wrap issuer_name_hash, issuer_key_hash, hash_algorithm, and serial_number getters in PyOnceLock so the allocations (PyBytes construction, integer conversion, hash-object instantiation) happen at most once per OCSPRequest object. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Alexander Bokovoy <abokovoy@redhat.com>

…gorithm getter results Wrap the five most-frequently-accessed computed properties in PyOnceLock so the underlying work (name parsing, public-key loading, OID conversion, hash-algorithm object construction) runs at most once per Certificate object regardless of how many times callers read the attribute. Also update all Certificate struct construction sites (pkcs7.rs, ocsp_resp.rs) to initialise the new cache fields. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Alexander Bokovoy <abokovoy@redhat.com>

The old implementation used index-based .nth(i) over a freshly-cloned iterator per certificate, making the total work O(n²) in the number of embedded certs. Also, each call rebuilt the Python list from scratch. Replace with a single linear pass using asn1::write_single to obtain independent DER bytes for each certificate (avoiding the need for the unsafe map_arc_data_ocsp_response helper), then wrap the built PyList in a PyOnceLock so subsequent calls return the cached object. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Alexander Bokovoy <abokovoy@redhat.com>

OCSPSingleResponse lacked an extensions getter entirely. Add one backed by a PyOnceLock so the extension-parsing work runs at most once per response object. Handles SCT and CRL entry extensions via the shared parse_and_cache_extensions helper. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Alexander Bokovoy <abokovoy@redhat.com>

Wrap the issuer, signature_algorithm_oid, and signature_hash_algorithm getters in PyOnceLock so name parsing and OID/hash-object construction each run at most once per CertificateRevocationList object. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Alexander Bokovoy <abokovoy@redhat.com>

get_revoked_certificate_by_serial_number previously iterated over every revoked certificate on each call (O(n)). Build a HashMap<Vec<u8>, OwnedRevokedCertificate> on first call using the existing iterator infrastructure, then answer subsequent lookups in O(1). Also removes the now-unused try_map_crl_to_revoked_cert unsafe helper. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Alexander Bokovoy <abokovoy@redhat.com>

…icates caching OCSPSingleResponse.extensions was added in commit 986298b but had no tests. Add four tests in TestOCSPResponse: * test_single_response_extensions_empty – a typical response with no per-SingleResponse extensions returns an empty Extensions object and the result is the same cached object on repeated access. * test_single_response_extensions_sct – resp-sct-extension.der carries an SCT list in the raw_single_extensions field; verify it is exposed via the new getter on the OCSPSingleResponse iterator item. * test_single_response_extensions_reason – resp-single-extension-reason.der carries a CRLReason; verify it surfaces correctly. * test_certificates_cached – OCSPResponse.certificates is cached behind a PyOnceLock; verify that two successive accesses return the identical Python list object (is-identity check). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Alexander Bokovoy <abokovoy@redhat.com>

alex · 2026-03-08T12:16:33Z

Thanks for submitting this -- for ease of review, can you split this into a few smaller PRs? My suggsetion would be to start with splitting out:

The subject/issuer properties
the public key properties

and we can go from there. Thanks

abbra · 2026-03-08T13:16:07Z

@alex thanks, I opened #14442 for the first one. Since all other PRs would depend on the previous ones being merged, should I wait with the remaining ones?

reaperhulk · 2026-03-08T13:35:27Z

Yeah since GH hasn’t shipped dependent PRs yet you should just submit one and once it merges rebase and submit the next.

abbra force-pushed the abbra-p-f branch from 7b51b62 to 60dac5e Compare March 8, 2026 11:15

abbra and others added 9 commits March 8, 2026 13:24

abbra force-pushed the abbra-p-f branch from 60dac5e to 0445560 Compare March 8, 2026 11:24

abbra force-pushed the abbra-p-f branch from 3347233 to 5583d8c Compare March 8, 2026 11:43

abbra mentioned this pull request Mar 8, 2026

add caching for issuer and subject #14442

Merged

abbra mentioned this pull request Mar 8, 2026

add caching for public key #14443

Merged

This was referenced Mar 8, 2026

caching CSR attributes #14444

Merged

add attributes caching to OCSP request #14445

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more property-caching optimizations to x509 Rust backend#14441

Add more property-caching optimizations to x509 Rust backend#14441
abbra wants to merge 10 commits intopyca:mainfrom
abbra:abbra-p-f

abbra commented Mar 8, 2026

Uh oh!

alex commented Mar 8, 2026

Uh oh!

abbra commented Mar 8, 2026

Uh oh!

reaperhulk commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Conversation

abbra commented Mar 8, 2026

PyCA x509 Rust backend — caching optimizations

Background

Caching pattern

What was implemented

Benchmark results

Why the existing load benchmarks showed no improvement

Benchmark reproduction

Uh oh!

alex commented Mar 8, 2026

Uh oh!

abbra commented Mar 8, 2026

Uh oh!

reaperhulk commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants